[SPARK-56284] Adding UDF worker specification protobuf definition by sven-weber-db · Pull Request #55165 · apache/spark

sven-weber-db · 2026-04-02T16:21:36Z

What changes were proposed in this pull request?

This PR introduces the protobuf definitions for the UDF worker specification described in SPIP SPARK-55278 and this design document.

Overall, two new .proto files are introduced:

common.proto - Shared types and messages between the worker specification & the new UDF protocol (to be introduced)
worker_spec.proto - UDF worker specification

Why are the changes needed?

This is the first step toward a language-agnostic UDF protocol for Spark that enables UDF workers written in any language to communicate with the Spark engine through a well-defined specification and API boundary. The abstractions introduced here establish the core contract that concrete implementations (e.g., process-based or gRPC-based workers) will build on.

The worker specification introduced in this PR captures all the information Spark needs to:

Plan UDF execution (concurrency, supported UDF types, etc.)
Provision a UDF worker and connect to it for UDF invocation

Does this PR introduce any user-facing change?

No. All new APIs are marked @experimental, and there are no behavioral changes to existing code.

How was this patch tested?

Compilation of the proto files verified via both Maven and SBT.

Was this patch authored or co-authored using generative AI tooling?

Yes, in an assistive manner and for reviews.

cloud-fan

Summary

This PR fills in the previously placeholder UDFWorkerSpecification protobuf message with the full worker specification schema per SPIP SPARK-55278.

Design approach: Two proto files define a layered worker specification:

common.proto — shared types for reuse by both the worker spec and the forthcoming UDF protocol: UDFWorkerDataFormat (data serialization format), UDFShape/SparkUDFShapes (UDF execution shapes).
worker_spec.proto — the full specification: UDFWorkerSpecification composes WorkerEnvironment (lifecycle callables), WorkerCapabilities (data formats, UDF shapes, concurrency/reuse flags), and a DirectWorker (process callable + connection + timeout properties). Transport is abstracted via WorkerConnection (oneof of Unix domain socket or TCP).

Key design decisions:

ProcessCallable separates command (executable prefix) from arguments, with the engine injecting --id and --connection at invocation time.
oneof worker in UDFWorkerSpecification and oneof transport in WorkerConnection provide extension points for future worker provisioning strategies and transport types.
WorkerCapabilities.supports_concurrent_udfs is defined but explicitly deferred for future use.

General comments:

udf/worker/README.md (line 23) still says "UDFWorkerSpecification -- currently a placeholder" — should be updated now that the specification is filled in.
Spark Connect protos use (Required) / (Optional) annotations on field comments to clarify the application-level contract. For fields like supported_data_formats (where the comment says "Every worker MUST at least support ARROW"), such annotations would make the requirement immediately visible to proto consumers.

cloud-fan · 2026-04-07T14:19:47Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // engine-configurable maximum time (e.g. 30 seconds).
+    optional int32 graceful_termination_timeout_ms = 2;
+
+    // The connection this [[UDFWorker]] supports. Note that a single


[[UDFWorker]] is not defined anywhere — no proto message, no Scala/Java class. The same dangling reference appears at lines 149 and 159. The closest entity is DirectWorker (line 101). Should these reference DirectWorker, or is UDFWorker a planned type not yet introduced?

Good catch! [[UDFWorker]] was the name we previously used for DirectWorker. Before raising this PR, it was renamed, and I seem to have forgotten to update all references in the text to the old name. This should be fixed now. Thank you!

cloud-fan · 2026-04-07T14:19:47Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    //   ["\"echo 'Test'\""]
+    //
+    // Every executable will ALWAYS receive a
+    // --id argument. This argument CANNOT be part of the below list of arguments. 


The --id argument is explicitly reserved here ("CANNOT be part of the below list of arguments"), but --connection (injected by the engine per lines 130–134) has no such restriction documented. A user including --connection in their arguments would conflict with the engine-injected value. Consider adding the same reservation for --connection.

Yes, very good point. I have updated the description to a list of restricted values including both --id and --connection. Thank you!

cloud-fan · 2026-04-07T14:19:47Z

udf/worker/proto/src/main/protobuf/common.proto

+    }
+}
+
+enum SparkUDFShapes {


SparkUDFShapes uses plural naming, while UDFWorkerDataFormat in the same file uses singular. Proto convention recommends singular enum names — consider SparkUDFShape.

Good catch, thank you! Fixed.

cloud-fan · 2026-04-07T14:19:47Z

udf/worker/proto/src/main/protobuf/common.proto

+    // produces iterator to a batch of rows as output.
+    MAP_PARTITIONS = 2;


Grammar — "a iterator" and missing article:

Suggested change

// produces iterator to a batch of rows as output.

MAP_PARTITIONS = 2;

// UDF receives an iterator to a batch of rows as input and

// produces an iterator to a batch of rows as output.

Missed this - thank you!

cloud-fan · 2026-04-07T14:19:47Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+
+    // Which types of UDFs this worker supports.
+    // This should list all supported Shapes.
+    // Of which shape a specific UDF is will be communicated


Awkward phrasing:

Suggested change

// Of which shape a specific UDF is will be communicated

// The shape of a specific UDF will be communicated

udf/worker/proto/src/main/protobuf/worker_spec.proto

cloud-fan · 2026-04-07T14:19:47Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // After this time, the worker process should have terminated itself.
+    // Otherwise, the process will be forcefully killed using SIGKILL.
+    //
+    // The engine will use this timeout, if it does not exceed a 


Suggested change

// The engine will use this timeout, if it does not exceed a

// The engine will use this timeout, if it does not exceed an

cloud-fan · 2026-04-07T14:19:48Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    }
 }
+
+ // Communication between the engine and worker


Leading space before // — inconsistent with all other message-level comments:

Suggested change

// Communication between the engine and worker

// Communication between the engine and worker

cloud-fan · 2026-04-07T14:19:48Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+// is done using a UNIX domain socket.
+//
+// On [[UDFWorker]] creation, a path to a socket 
+// to listen on is passed as a argument.


Suggested change

// to listen on is passed as a argument.

// to listen on is passed as an argument.

cloud-fan · 2026-04-07T14:19:48Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    //   ["python3", "-m"]
+    //   ["worker.bin"]
+    //   ["java", "worker.java"]
+    //   ["bin/bash", "-c"]


Missing leading /:

Suggested change

// ["bin/bash", "-c"]

// ["/bin/bash", "-c"]

sven-weber-db

Adjusted according to review comments

sven-weber-db · 2026-04-07T14:46:14Z

udf/worker/proto/src/main/protobuf/common.proto

+    }
+}
+
+enum SparkUDFShapes {


Good catch, thank you! Fixed.

sven-weber-db · 2026-04-07T14:56:22Z

udf/worker/proto/src/main/protobuf/common.proto

+    // produces iterator to a batch of rows as output.
+    MAP_PARTITIONS = 2;


Missed this - thank you!

sven-weber-db · 2026-04-07T15:04:43Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // engine-configurable maximum time (e.g. 30 seconds).
+    optional int32 graceful_termination_timeout_ms = 2;
+
+    // The connection this [[UDFWorker]] supports. Note that a single


Good catch! [[UDFWorker]] was the name we previously used for DirectWorker. Before raising this PR, it was renamed, and I seem to have forgotten to update all references in the text to the old name. This should be fixed now. Thank you!

sven-weber-db · 2026-04-07T15:19:32Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    //   ["\"echo 'Test'\""]
+    //
+    // Every executable will ALWAYS receive a
+    // --id argument. This argument CANNOT be part of the below list of arguments. 


Yes, very good point. I have updated the description to a list of restricted values including both --id and --connection. Thank you!

hvanhovell · 2026-04-09T15:03:25Z

udf/worker/proto/src/main/protobuf/common.proto

+
+// The UDF execution type/shape.
+message UDFShape {
+    oneof shape {


Why this indirection? The shape also has nothing to do with spark...

I revisited this Part with @haiyangsun-db. Instead of using a UDFShape, we now propose a UDFProtoCommunicationPattern. This captures which communication pattern of the UDFProto (to be added) is supported by this UDF worker. For all initial versions, this will be a bidirectional stream of bytes. However, for simpler use cases, we might add new communication modes, such as Request->Response, in the future.

The UDFShape will be covered via the client-specific init message. E.g., for Python UDFs, the Python client can tell the Python worker how to invoke the UDF (batch iterator, row-wise, etc.).

hvanhovell · 2026-04-09T15:04:00Z

udf/worker/proto/src/main/protobuf/common.proto

+
+    // UDF receives a row with 0+ columns as input
+    // and produces a single, scalar value as output
+    EXPRESSION = 1;


Name it SCALAR or ONE_TO_ONE?

Deprecated according to the above comment

hvanhovell · 2026-04-09T15:05:11Z

udf/worker/proto/src/main/protobuf/common.proto

+
+    // UDF receives an iterator to a batch of rows as input and
+    // produces an iterator to a batch of rows as output.
+    MAP_PARTITIONS = 2;


MANY_TO_MANY?

Deprecated according to the above comment

hvanhovell · 2026-04-09T15:07:36Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+// Examples:
+//  /tmp/channel-uuid.sock
+//  /some/system/path/channel-1234.sock
+message UnixDomainSocket {}


I am guessing you are going to add a path to this proto at some point?

No, the path is injected into the worker callable via the --connection parameter. All properties in this .proto are client-supplied, but the path should remain in control of the engine.

I added a separate message for the UDS for future extension points. For UDS itself, it is unlikely that we will ever need an extension. However, for TCP, we might need to specify additional properties in the future. To make the proto consistent, I propose using the same structure for all transport types.

hvanhovell · 2026-04-09T15:08:13Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+// Examples:
+//  8080
+//  1234 
+message TcpConnection {}


If this is local only, then name it as such?

Same question as before. I assume you are going to add a port here?

Yes, good point. I renamed it LocalTcpConnection.

As for the port: This is the same as for the UDS. The port is engine-controlled and passed as a --connection flag to the UDF worker callable. This LocalTcpConnection message is there to capture additional TCP properties that the client might control in the future.

udf/worker/proto/src/main/protobuf/worker_spec.proto

hvanhovell · 2026-04-09T15:18:23Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+/// Worker specification
+///
 message UDFWorkerSpecification {
+    WorkerEnvironment environment = 1;


WorkerEnvironment looks like it is in the wrong place. It uses a lot of ProcessCallables which only seem to make sense for DirectWorker. Why not make it part of the direct worker to begin with?

The environment is not only needed for the DirectWorker. If we support a daemon-based worker creation in the future, this daemon would also need to be installed and set up in some way, which can be ensured through the environment. Therefore, I think this is not solely a property of the DirectWorker, but of all possible worker-creation modes we might support in the future.

Ok, let me ask a different different question then: Are there going to be different ways in which a customer can setup a worker environment? If there are then we should at least make this a oneOf... (which we can do later on as well).

hvanhovell · 2026-04-09T15:21:51Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+}
+
+// Capabilities used for query planning
+message WorkerCapabilities {


IMO it would be a good idea to make this have a somewhat self describing name.

Do you have a concrete name in mind?

For me, Capabilities already captures the notion of this message pretty well. This message contains everything the engine needs to plan and execute a UDF using this worker. E.g., it describes what the UDF worker is capable of (which data formats are supported, whether it supports concurrency, etc.).

I did update the comment:

// Capabilities used for query planning

to

// Capabilities used for query planning // and running the worker during query execution.

To reflect that these properties are not only for query planning but also for execution

hvanhovell · 2026-04-13T15:03:07Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // - Any needed dependencies are present
+    //
+    // (Optional)
+    optional ProcessCallable environment_verification = 2;


Document that verification failure means that we need installation. Should be very specific about the error codes here, so we can also add support a fatal conditions as well (e.g. no-GPU, incompatible CPU arch, ...)?

hvanhovell · 2026-04-13T15:04:58Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // can subsequently locate and launch the worker process.
+    // 
+    // (Required if environment_verification is given)
+    optional ProcessCallable installation = 1;


Document that installation failure is fatal?

hvanhovell · 2026-04-13T15:06:13Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+//        └─────────┘         
+//
+// All scripts are optional.
+// However, if a verification script is supplied, an installation


Unless the verification scripts checks for a fatal condition.

hvanhovell

LGTM

hvanhovell · 2026-04-13T18:35:49Z

Merging this to master. We can address further concerns in follow-ups.

Yicong-Huang

Left some comments. mainly on protobuf defs and lifecycle problems. we can address in follow ups.

Yicong-Huang · 2026-04-13T18:45:45Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // engine-configurable maximum time (e.g. 30 seconds).
+    //
+    // (Optional)
+    optional int32 initialization_timeout_ms = 1;


timeout better use uint to avoid negative value?

or maybe protobuf.Duration

Yicong-Huang · 2026-04-13T18:48:09Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // (Optional)
+    optional ProcessCallable environment_cleanup = 3;


when installation fails, do we also clean up partial setup environment?

Yicong-Huang · 2026-04-13T18:51:43Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+    // engine implements more advanced resource management (TBD).
+    //
+    // (Optional)
+    optional bool supports_concurrent_udfs = 3;


if not used right now, we can just reserve the keyword https://protobuf.dev/programming-guides/proto3/ ?

e.g.,

message Foo { reserved 2, 15, 9 to 11; }

Yicong-Huang · 2026-04-13T18:53:47Z

udf/worker/proto/src/main/protobuf/worker_spec.proto

+//        ▼        ▼          
+//            ...             
+//   UDF worker creation      
+//            ...             


it is not clear how installation failures are handled. also (not related to this figure) it is not clear how worker creation failures are handled.

sven-weber-db force-pushed the spark-56324 branch from 046fd81 to 3b6fb12 Compare April 7, 2026 11:52

sven-weber-db changed the title ~~[WIP][SPARK-56284] Adding UDF worker specification protobuf definition~~ [SPARK-56284] Adding UDF worker specification protobuf definition Apr 7, 2026

sven-weber-db force-pushed the spark-56324 branch from 3b6fb12 to 24ecbb2 Compare April 7, 2026 11:59

cloud-fan reviewed Apr 7, 2026

View reviewed changes

sven-weber-db force-pushed the spark-56324 branch from 24ecbb2 to bd1bbf2 Compare April 7, 2026 15:20

sven-weber-db commented Apr 7, 2026

View reviewed changes

sven-weber-db force-pushed the spark-56324 branch from bd1bbf2 to 70f38f9 Compare April 9, 2026 11:46

hvanhovell reviewed Apr 9, 2026

View reviewed changes

udf/worker/proto/src/main/protobuf/worker_spec.proto Show resolved Hide resolved

hvanhovell reviewed Apr 9, 2026

View reviewed changes

sven-weber-db force-pushed the spark-56324 branch 2 times, most recently from db55a96 to 5832ff6 Compare April 13, 2026 11:19

[SPARK-56284] Adding UDF worker specification protobuf definition

bd3daa3

sven-weber-db force-pushed the spark-56324 branch from 5832ff6 to bd3daa3 Compare April 13, 2026 11:26

sven-weber-db requested a review from hvanhovell April 13, 2026 12:16

hvanhovell reviewed Apr 13, 2026

View reviewed changes

hvanhovell approved these changes Apr 13, 2026

View reviewed changes

asf-gitbox-commits closed this in 5fde737 Apr 13, 2026

Yicong-Huang reviewed Apr 13, 2026

View reviewed changes

		// produces iterator to a batch of rows as output.
		MAP_PARTITIONS = 2;

	// Of which shape a specific UDF is will be communicated
	// The shape of a specific UDF will be communicated

	// The engine will use this timeout, if it does not exceed a
	// The engine will use this timeout, if it does not exceed an

	// Communication between the engine and worker
	// Communication between the engine and worker

	// to listen on is passed as a argument.
	// to listen on is passed as an argument.

		// (Optional)
		optional ProcessCallable environment_cleanup = 3;

Conversation

sven-weber-db commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sven-weber-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hvanhovell Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sven-weber-db commented Apr 2, 2026 •

edited

Loading

hvanhovell Apr 9, 2026 •

edited

Loading